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Abstract — A constant-rate encoder-decoder pair is presented 
for a fairly large family of two-dimensional (2-D) constraints. 
Encoding and decoding is done in a row-by-row manner, and is 
sliding-block decodable. 

Essentially, the 2-D constraint is turned into a set of indepen- 
dent and relatively simple one-dimensional (1-D) constraints; this 
is done by dividing the array into fixed-width vertical strips. Each 
row in the strip is seen as a symbol, and a graph presentation 
of the respective 1-D constraint is constructed. The maxentropic 
stationary Markov chain on this graph is next considered: a 
perturbed version of the corresponding probability distribution 
on the edges of the graph is used in order to build an encoder 
which operates in parallel on the strips. This perturbation is 
found by means of a network flow, with upper and lower bounds 
on the flow through the edges. 

A key part of the encoder is an enumerative coder for constant- 
weight binary words. A fast realization of this coder is shown, 
using floating-point arithmetic. 

I. Introduction 

Let G = (y, E, L) be an edge-labeled directed graph 
(referred to hereafter simply as a graph), where V is the vertex 
set, E is the edge set, and L : E ^ is the edge labeling 
taking values on a finite alphabet E [15, §2.1]. We require 
that the labeling L is deterministic: edges that start at the same 
vertex have distinct labels. We further assume that G has finite 
memory [15, §2.2.3]. The one-dimensional (1-D) constraint 
S = S{G) that is presented by G is defined as the set of 
all words that are generated by paths in G (i.e., the words 
are obtained by reading-off the edge labels of such paths). 
Examples of 1-D constraints include runlength-limited (RLL) 
constraints [15, §1.1.1], symmetric runlength-limited (SRLL) 
constraints [10], and the charge constraints [15, §1.1.2]. The 
capacity of S is given by 

cap(S')== lim (l/£) -loga ISTlE^I . 

An -track parallel encoder for S = S{G) at rate R is 
defined as follows (see Figure [1]). 

1) At stage t = 0,1,2,---, the encoder (which may be 
state-dependent) receives as input M ■ R (unconstrained) 
information bits. 

2) The output of the encoder at stage i is a word g'*-* = 

of length M over E. 



M 
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Fig. 1. Array corresponding to an A^-track parallel encoder 

For 1 < < M, the kth track 7^, = {gf'YtZl of any 
given length I, belongs to S. 

There are integers m,a > such that the encoder is 
{m,a)-sliding-block decodable (in short, (m,a)-SBD): 
for t > m, the Af • R information bits which were input 
at stage t are uniquely determined by (and can be effi- 



ciently calculated from) g 



(t-m) „(t-m + l) 



(t+a 



The decoding window size of the encoder is m + a + 1, and it is 
desirable to have a small window to avoid error propagation. 
In this work, we will be mainly focusing on the case where 
a = 0, in which case the decoding requires no look-ahead. 

In [12], it was shown that by introducing parallelism, one 
can reduce the window size, compared to conventional serial 
encoding. Furthermore, it was shown that as M tends to 
infinity, there are (0, 0)-SBD parallel encoders whose rates ap- 
proach cap(S'(G')). A key step in [12] is using some perturba- 
tion of the conditional probability distribution on the edges of 
G, corresponding to the maxentropic stationary Markov chain 
on G. However, it is not clear how this perturbation should be 
done: a naive method will only work for unrealistically large 
M. Also, the proof in [12] of the (0, 0)-SBD property is only 
probabilistic and does not suggest encoders and decoders that 
have an acceptable running time. 

In this work, we aim at making the results of [12] more 



tractable. At the expense of possibly increasing the memory 
of the encoder (up to the memory of G) we are able to 
define a suitable perturbed distribution explicitly, and provide 
an efficient algorithm for computing it. Furthermore, the 
encoding and decoding can be carried out in time complexity 
0(M log^ Af log log M), where the multiplying constants in 
the O(-) term are polynomially large in the parameters of G. 

Denote by diam(G) the diameter of G (i.e., the longest 
shortest path between any two vertices in G) and let Ac = 
(fli j) be the adjacency matrix of G, i.e., a^.j is the number 
of edges in G that start at vertex i and terminate in vertex j. 
Our main result, specifying the rate of our encoder, is given 
in the next theorem. 

Theorem 1: Let G be a deterministic graph with memory 
m. For M sufficiently large, one can efficiently construct an 
Af -track (m, 0)-SBD parallel encoder for S = S{G) at a rate 
R such that 



R> ca 



p{S{G))[ 



1 



|y|diam(G)^ 



2M 



-O 



\V\'^\og (Af ■ amax/amin) 

M - |y|diam(G)/2 



(1) 



where Omin (respectively, Cmax) is the smallest (respectively, 
largest) nonzero entry in Ac- 

The structure of this paper is as follows. In Section |ll] 
we show how parallel encoding can be used to construct an 
encoder for a 2-D constraint. As we will show, a parallel 
encoder is essentially defined through what we term a multi- 
plicity matrix. Section |lll] defines how our parallel encoder 
works, assuming its multiplicity matrix is given. Then, in 
Section |IV] we show how to efficiently calculate a good 
multiplicity matrix. Although 2-D constraints are our main 
motivation. Section |V] shows how our method can be applied 
to 1-D constraints. Section [VTl defines two methods by which 
the rate of our encoder can be slightly improved. Finally, in 
Section IVIII we show a method of efficiently realizing a key 
part of our encoding procedure. 

II. Two-dimensional constraints 

Our primary motivation for studying parallel encoding is to 
show an encoding algorithm for a family of two-dimensional 
(2-D) constraints. 

The concept of a 1-D constraint can formally be generalized 
to two dimensions (see [12, §1]). Examples of 2-D constraints 
are 2-D RLL constraints [14], 2-D SRLL constraints [10], and 
the so-called square constraint [16]. Let § be a given 2-D 
constraint over a finite alphabet S. We denote by w] the 
set of all ^ X w arrays in §. The capacity of § [5] is given by 

cap(S)= lim .\og^\^[t,w]\ . 

Suppose we wish to encode information to an ^ x w array 
which must satisfy the constraint §; namely, the array must 
be an element of S[^, w]. As a concrete example, consider the 
square constraint [16]: its elements are all the binary arrays in 



which no two '1' symbols are adjacent on a row, column, or 
diagonal. 

We first partition our array into two alternating types of 
vertical strips: data strips having width w^, and merging strips 
having width Wm- In our example, let Wd = 4 and w^^ = 1 
(see Figure |2]). 
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Fig. 2. Binary array satisfying the square constraint, partitioned into data 
strips of width uij = 4 and merging strips of width ui^ = 1- 



Secondly, we select a graph G = {V, E, L) with a labeling 
L : E ^ S[l,Wd] such that S{G) C §, i.e., each path of 
length £ in G generates a (column) word which is in Wd]- 
We then fill up the data strips of our £ xw array with Ix. 
arrays corresponding to paths of length i in G. Thirdly, we 
assume that the choice of w^i allows us to fill up the merging 
strips in a row-by-row (causal) manner, such that our £ x w 
array is in §. Any 2-D constraint § for which such Wd, Wm, 
and G can be found, is in the family of constraints we can 
code for (for example, the 2-D SRLL constraints belong to 
this family [10]). 

Consider again the square constraint: a graph which pro- 
duces all £ X Wd arrays that satisfy this constraint is given 
in Figure [3] Also, for Wm = 1, we can take the merging 
strips to be all-zero. (There are cases, such as the 2-D SRLL 
constraints, where determining the merging strips may be less 
trivial [10].) 




Fig. 3. Graph G whose paths generate all X 4 arrays satisfying the square 
constraint. The label of an edge is given by the label of the vertex it enters. 



Suppose we have an (m,0)-SBD parallel encoder for S — 
S{G) at rate R with M — {w + 'w„i)/{wd + lOm) tracks. We 
may use this parallel encoder to encode information in a row- 
by-row fashion to our £ x w array: at stage t we feed M ■ R 



information bits to our parallel encoder. Let g 



(t) 



(t)^M 



be the output of the parallel encoder at stage t. We write 
to row t of the fcth data strip, and then appropriately fill up 
row t of the merging strips. Decoding of a row in our array 
can be carried out based only on the contents of that row and 
the previous m rows. 

Since M ■ R information bits are mapped to M ■ Wd + {M — 
1) - Win symbols in S, the rate at which we encode information 



to the array is 

Wd + Wm(l - l/M) ~ Wd+ Wm(l - l/M) ■ 

We note the following tradeoff: Typically, taking larger values 
of Wd (while keeping constant) will increase the right- 
hand side of the above inequality. However, the number of 
vertices and edges in G will usually grow exponentially with 
Wd- Thus, Wd is taken to be reasonably small. 

Note that in our scheme, a single error generally results in 
the loss of information stored in the respective vertical sliding- 
block window. Namely, a single corrupted entry in the array 
may cause the loss of m + 1 rows. Thus, our method is only 
practical if we assume an error model in which whole rows 
are corrupted by errors. This is indeed the case if each row 
is protected by an error-correcting code (for example, by the 
use of unconstrained positions [7]). 

III. Description of the encoder 
Let be a positive integer which will shortly be specified. 
The N words 7^. = (9^ )1=0^ 1 < k < N, that we will 
be writing to the first N tracks are all generated by paths of 
length £ in G. In what follows, we find it convenient to regard 
the i X N arrays (7^)^^ = (ff[*'')t=i^i as (column) words 
of length £ of some new 1 -D constraint, which we define next. 

The Nth Kronecker power of G — {V^ E, L), denoted by 
Q^N ^ {y^ ,L^), is defined as follows. The vertex set 
is simply the iVth Cartesian power of V; that is, 

= {{vi,V2,...,Vn) -.Vk^V} . 

An edge e ~ (ei, 62, . . . , e^) G goes from vertex v = 
{vi,V2, ■ ■ ■ ,vn) e to vertex v' = (w'l, • ■ ■ ; ^'n) ^ 
and is labeled L^{e) — {bi, 62, ... , Bn) whenever for all 1 < 
k < N, Ck is an edge from Vk to labeled hk- 

Note that a path of length £ in is just a handy way 

to denote N paths of length £ in G. Accordingly, the £ x N 
aiTays (7^)^^ are the words of length £ in S{G^^). 

Let G be as in Section |T] and let Aq — (ai.j) be the 
adjacency matrix of G. Denote by 1 the 1 x |V^| all-one row 
vector. The description of our Af -track parallel encoder for 
S ~ S{G) makes use of the following definition. A \V\ x \V\ 
nonnegative integer matrix D = {dij)ij^v is called a (valid) 
multiplicity matrix with respect to G and M if 

1 ■ L> • 1^ < M , (2) 
1 ■ £> = 1 ■ D'^ , and (3) 
J > only if a,; j > . (4) 

(While any multiplicity matrix will produce a parallel encoder, 
some will have higher rates than others. In Section |IV] we 
show how to compute multiplicity matrices D that yield rates 
close to cap(S'(G')).) 

Recall that we have at our disposal M tracks. However, we 
will effectively be using only the first N = \ • D ■ 1^ tracks 
in order to encode information. The last M — N tracks will 
all be equal to the first track, say. 



Write r = (r,)jgy = 1 • D"^. A vertex v = (wfc)jLi G 
is a typical vertex (with respect to D) if for all i, the vertex 
i appears as an entry in v exactly times. Also, an edge 
e = {ck)k=i £ is a typical edge with respect to D if for 
all i,j G V, there are exactly dij entries which — as edges 
in G — start at vertex i and terminate in vertex j. 

A simple computation shows that the number of outgoing 
typical edges from a typical vertex equals 

A = n.gy'^.!^^ 

(where 0° = 1). For example, in the simpler case where G 
does not contain parallel edges (a^.j £ {0, 1}), we are in effect 
counting in (|5]l permutations with repetitions, each time for a 
different vertex i . 

The encoding process will be carried out as follows. We 
start at some fixed typical vertex v'^"^ G V'^ . Out of the set of 
outgoing edges from we consider only typical edges. The 
edge we choose to traverse is determined by the information 
bits. After traversing the chosen edge, we arrive at vertex v^^\ 
By (|3]l, ■u*-^^ is also a typical vertex, and the process starts 
over This process defines an Af-track parallel encoder for 
S = S{G) at rate 

This encoder is (m,0)-SBD, where m is the memory of G. 

Consider now how we map M ■ R information bits into an 
edge choice e G E^ at any given stage t. Assuming again 
the simpler case of a graph with no parallel edges, a natural 
choice would be to use an instance of enumerative coding [9]. 
Specifically, suppose that for < 5 < n, a procedure for 
encoding information by an n-bit binary vector with Hamming 
weight 5 were given. Suppose also that V = {1,2, ... , 
We could use this procedure as follows. First, for n~ri and 
5 = c?i.i, the binary word given as output by the procedure 
will define which di^i of the possible ri entries in e will be 
equal to the edge in E from the vertex 1 G to itself (if no 
such edge exists, then di 1 = 0). Having chosen these entries, 
we run the procedure with n = ri — di i and S = di^2 to 
choose from the remaining ri — di^i entries those that will 
contain the edge in E from 1 G to 2 G We continue this 
process, until all ri entries in e containing an edge outgoing 
from 1 G have been picked. Next, we run the procedure with 
n — r2 and 5 — 1^2,1, and so forth. The more general case of 
a graph containing parallel edges will include a preliminary 
step: encoding information in the choice of the dij edges used 
to traverse from i to j (uij options for each such edge). 

A fast implementation of enumerative coding is presented in 
Section lyn] The above-mentioned preliminary step makes use 
of the Schonhage-Strassen integer-multiplication algorithm [3, 
§7.5], and the resulting encoding time complexity is propor- 



tionaf] to M log^ M log log M. It turns out that this is also 
the decoding time complexity. Further details are given in 
Section Ivn] 

The next section shows how to find a good multiplicity 
matrix, i.e., a matrix D such that R{D) is close to cap(S'(G)). 

IV. Computing a good multiplicity matrix 

In order to enhance the exposition of this section, we 
accompany it by a running example (see Figure |4|. 



and 





Aa = 



Fig. 4. Running Example (1): Graph G and the con'esponding adjacency 
matrix Ag- 

Throughout this section, we assume a probability distribu- 
tion on the edges of G, which is the maxentropic stationary 
Markov chain V on G [15]. Without real loss of generality, we 
can assume that G is irreducible (i.e., strongly-connected), in 
which case P is indeed unique. Let the matrix Q — (qij) be 
the transition matrix induced by P, i.e., qij is the probability 
of traversing an edge from i E V to j G V, conditioned on 
currently being at vertex i E V. 

Let TV — {iTi) be the 1 x H/| row vector corresponding to the 
stationary distribution on V induced by Q; namely, nQ = tv 



and J2 



and define 



= 1. Let 

M' = M - 



[|y|diam(G)/2j 



(6) 



p= (pi) , Pi = AfVi , and P = (p^j) , = piQi^j 

Running Example (2): Taking the number of tracks in our 
running example (Figure |4|i to be M — 12 gives M' — 9. 
Also, our running example has 



TT = ( 0.619 0.282 0.099 



and 



Thus, 



Q 



0.544 0.456 
0.647 
1 




p = ( 5.57 2.54 0.89 ) 



'Actually, the time complexity for the preliminary step can be made linear 
in M, with a negligible penalty in terms of rate: Fix i and j, and let rj be an 
integer design parameter Assume for simplicity that ri\dij. The number of 
vectors of length r] over an alphabet of size j is obviously o,^ j - So, we can 
encode [»?log2 ai.j\ bits thi'ough the choice of such a vector Repeating this 
process, we can encode (dij/rj) ■ [r)log2aijJ bits through the choice of 
dij/ri such vectors. The concatenation of these vectors is taken to represent 
our choice of edges. Note that the encoding process is linear in M for constant 
rj. Also, our losses (due to the floor function) become negligible for modestly 
large t). 



P 




Note that 



p=lP' 

Also, observe that (ISJ-® hold when we substitute P for 
D. Thus, if all entries of P were integers, then we could 
take D equal to P. In a way, that would be the best choice 
we could have made: by using Stirling's approximation, we 
could deduce that R{D) approaches cap(5(G)) as M oo. 
However, the entries of P, as well as p, may be non-integers. 

We say that an integer matrix P = (pi.j) is a good 
quantization of P = (pij) if 

M' 



3.03 2.54 
1.65 
0.89 



and M' = 1-P ■ 1^ 



J2jevPiJ 

P^,3 
^ii£VPiJ 



J2jevPi:j 

P^,3^ 



and- 



\Y.^evP^A 



(7) 
(8) 
(9) 
(10) 



Namely, a given entry in P is either the floor or the ceiling of 
the corresponding entry in P, and this also holds for the sum 
of entries of a given row or column in P; moreover, the sum 
of entries in both P and P are exactly equal (to M'). 

Lemma 2: There exists a matrix P which is a good quan- 
tization of P. Furthermore, such a matrix can be found by an 
efficient algorithm. 



(M', M') 






1^1, 



Fig. 5. Flow network for the proof of Lemma|2] An edge labeled (a, b) has 
lower and upper bounds a and b, respectively. 

Proof: We recast (f7]l-(fT0li as an integer flow problem 
(see Figures |5] and |6]). Consider the following flow network, 
with upper and lower bounds on the flow through the edges [4, 
§6.7]. The network has the vertex set 



M u {u^} u K} u {u[}^^y u {u';] 



3)jev ' 



with source Ua and target Ur- Henceforth, when we refer to the 
upper (lower) bound of an edge, we mean the upper (lower) 
bound on the flow through it. There are four kinds of edges: 

1) An edge Ua with upper and lower bounds both 
equaUng to M' . 

2) — > for every i & V, with the upper and lower 
bounds LEjgyKjJ and [X^jev P^jl' respectively. 



3) ur 



for every j,j £ V, with the upper and lower 



bounds \pij\ and [pij], respectively. 
4) m" Ur for every j e V, with the upper and lower 
bounds LE,;6vP«jJ and [X^ievKjl' respectively. 

We claim that (iTb-lfTOli can be satisfied if a legal integer 
flow exists: simply take pij as the flow on the edge from u'^ 
to u'-. 

It is well known that if a legal real flow exists for a flow 
network with integer upper and lower bounds on the edges, 
then a legal integer flow exists as well [4, Theorem 6.5]. 
Moreover, such a flow can be efficiently found [4, §6.7]. To 
finish the proof, we now exhibit such a legal real flow: 



1) The flow on the edge 

2) The flow on an edge u^^ 

3) The flow on an edge u[ 

4) The flow on an edge u" 
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Fig. 6. Running Example (3): The flow network derived from P in Running 
Example 2. An edge labeled a; b lias lower and upper bounds [aj and \a\ , 
respectively. A legal real flow is given by a. A legal integer flow is given by 
b. The matrix P resulting from the legal integer flow is given, as well as the 
matrix P (again). 



For the remaining part of this section, we assume that P is 
a good quantization of P (say, P is computed by solving the 
integer flow problem in the last proof). The next lemma states 
that P "almost" satisfies (O. 

Lemma 3: Let p — (pi) = 1 ■ P^ and r = (fi) = 1 ■ P. 
Then, for all i ^ V, 



p. -?,;£{-!, 0,1} . 
Proof: From dHJ, we get that for all i ^ V, 



(11) 




Recall that (O is satisfied if we replace D by P. Thus, by 
([Tol l, we have that (fTTT i also holds if we replace pi by fi. We 
conclude that jp^ — f j| < 1. The proof follows from the fact 
that entries of P are integers, and thus so are those of p and 
r. m 

The following lemma will be the basis for augmenting P 
so that (O is satisfied. 

Lemma 4: Fix two distinct vertices s,t £ V. We can 
efficiently find a \V\ x \V\ matrix F^"'*) = F = (/,j)jjev 
with non-negative integer entries, such that the following three 
conditions hold. 

(i) 

I F-l'^ < diam(G) . 

(ii) For all i,j G V, 

fij > only if aij > . 

(iii) Denote ^ = 1 • and a; = 1 • F. Then, for all i G V, 

{-1 ifi = s, 
1 if i = t, 
otherwise. 
Proof: Let fci = s, ^2, ^3 • • • , ^f+i = t he the vertices 
along a shortest path from s to tin G. For all i,j S V, define 

kj ^\{l<h<£:kh^i and kh+i = j}\ . (12) 

Namely, fij is the number of edges from i to j along the 
path. 

Conditions ^ and (O easily follow from (T% . Condition 
dinl l follows from the fact that (xi) is equal to the number 
of edges along the path for which i is the start (end) vertex 
of the edge. ■ 

The matrix P will be the basis for computing a good 
multiplicity matrix D, as we demonstrate in the proof of the 
next theorem. 

Theorem 5: Let P = (pi.j) be a good quantization of P. 
There exists a multiplicity matrix D = (dij) with respect to 
G and M, such that 

1) di,j > pij for all i,j G V, and — 

2) M' <l'-D-l'^ < M 

(where M' is as defined in (|6]l). Moreover, the matrix D can 
be found by an efficient algorithm. 

Proof: Consider a vertex i E V. If fi > pi, then we say 
that vertex i has a surplus of fi — pi. In this case, by Lemma |3] 
we have that the surplus is equal to 1. On the other hand, if 



fi < Pi then vertex i has a deficiency of pi — fi, which again 
is equal to 1. 

Of course, since X^ieyP* — Tli^v^i ~ ^^'^ 
surplus is equal to the total deficiency, and both are denoted 
by Surp: 

Surp = ^ max {0, f—pj = - ^ min {0, r^-pi} . (13) 

■lev iev 

Denote the vertices with surplus as (sfe)^"\'^ and the vertices 
with deficiency as Recalling the matrix F from 

Lemma m we define 

Surp 

— _P _|_ p{sk,tk) 
k=l 

We first show that D is a valid multiplicity matrix. Note that 
Surp < \V\/2. Thus, © follows from ©, and ©. The 
definitions of surplus and deficiency vertices along with dinl i 
give (O. Lastly, recall that (|4| is satisfied if we replace dij by 
Pi J. Thus, by the same can be said for pij. Combining 
this with ^ yields (|4|i. 

Since the entries of F^-^'"*^^ are non-negative for every k, 
we must have that dij > pij for all i,j e V. This, together 
with IS) and implies in turn that M' < 1 • £> ■ 1^ < M. 

■ 

Running Example (4): For the matrix P in Figure |6] we 
have 

f=(6 2 1), p={ 6 3 0). 

Thus, Surp — 1. Namely, the vertex 6 has a surplus while the 
vertex (3 has a deficiency. Taking s ~ 9 and t — (3 we get 

/010\ /430\ 
F'^"'*^ = ,and D = i 2 1. 
\100/ \100/ 

■ 

Now that Theorem|5]is proved, we are in a position to prove 
our main result. Theorem [T] Essentially, the proof involves 
using the Stirling approximation and taking into account the 
various quantization errors introduced into D. The proof itself 
is given in the Appendix. 

V. Enumerative coding into sequences with a 
GIVEN Markov type 

The main motivation for our methods is 2-D constrained 
coding. However, in this section, we show that they might be 
interesting in certain aspects of 1-D coding as well. Given a 
labeled graph G, a classic method for building an encoder for 
the 1-D constraint S{G) is the state-splitting algorithm [2]. 
The rate of an encoder built by [2] approaches the capacity 
of S{G). Also, the word the encoder outputs has a corre- 
sponding path in G, with the following favorable property: 
the probability of traversing a certain edge approaches the 
maxentropic probability of that edge (assuming an unbiased 
source distribution). However, what if we'd like to build 
an encoder with a different probability distribution on the 
edges? This scenario may occur, for example, when there is 



a requirement that all the output words of a given length N 
that are generated by the encoder have a prescribed Hamming 
weighji. 

More formally, suppose that we are given a labeled graph 
G — {V,E,L); to make the exposition simpler, suppose that 
G does not contain parallel edges. Let Q and tt be a transition 
matrix and a stationary probability distribution corresponding 
to a stationary (but not necessarily maxentropic) Markov chain 
V on G. We assume w.l.o.g. that each edge in G has a positive 
conditional probability. We are also given an integer M, which 
we will shortly elaborate on. 

We first describe our encoder in broad terms, so as that 
its merits will be obvious. Let D and N be as previously 
defined, and let Rt{D) be specified shortly. We start at some 
fixed vertex G V . Given M ■ Rt{D) information bits, we 
traverse a soon to be defined cyclic path of length N in G. The 
concatenation of the edge labels along the path is the word we 
output. Of course, since the path is cyclic, the concatenation 
of such words is indeed in S{G). Moreover, the path will have 
the following key property: the number of times an edge from 
i to j is traversed equals di,j. Namely, if we uniformly pick 
one of the N edges of the path, the probability of picking a 
certain edge e is constant (not a function of the input bits), and 
is equal to the probability of traversing e on the Markov chain 
V, up to a small quantization error. The rate Rt of our encoder 
will satisfy ([T]i, where we replace R by Rt and cap(S') by 
the entropy of V. We would like to be able to exactly specify 
the path length as a design parameter. However, we specify 
M and get an N between M and M - [\V\ diam(G)/2j. 

Our encoding process will make use of an oriented tree, 
a term which we will now define. A set of edges T C E 
is an oriented tree of G with root uo if |r| = — 1 and 
for each u ^ V there exists a path from u to vq consisting 
entirely of edges in T (see Figure |7]). Note that if we reverse 
the edge directions of an oriented tree, we get a directed tree 
as defined in [11, Theorem 2.5]. Since reversing the directions 
of all edges in an irreducible graph results in an irreducible 
graph, we have by [11, Lemma 3.3] that an oriented tree T 
indeed exists in G, and can be efficiently found. So, let us fix 
some oriented tree T with root vq. By [11, Theorem 2.5], we 
have that every vertex u G V which is not the root vo has an 
out-degree equal to 1. Thus, for each such vertex u we may 
define parent(u) as the destination of the single edge in T 
going out of u. 

We now elaborate on the encoding process. The encoding 
consists of two steps. In the first step, we map the information 
bits to a collection of lists. In the second step, we use the lists 
in order to define a cycUc path. 

First step: Given M ■ Rt{D) information bits, we build for 
each vertex i E V a list A^'^ of length r^, 

A« = (Af),A«,...,A«). 

-We remark in passing that one may use convex programming techniques 
(see [18, §V]) in order to efficiently solve the following optimization problem: 
find a probabihty distribution on the edges of G yielding a stationary Mai'kov 
chain with largest possible entropy, subject to a set of edges (such as the set 
of edges with label ' 1' ) having a prescribed cumulative probability. 




Fig. 7. Oriented tree with root vq- 



The entries of each A^*-' are vertices in V. Moreover, the 
following properties are satisfied for all i: 

• The number of times j is an entry in A*^*' is exactly di,j. 

• If i vq, then the last entry of the list equals the parent 
of i. Namely, 

A^*^ = parent(z) . 

Recalling (|5]), a simple calculation shows that the number 
of possible list collections is 

A^^A. Yl ^^l^E^f^. (14) 
Thus, we define the rate of encoding as 

Ll0g2 AtJ 

^ M 

Also, note that as in the 2-D case, we may use enumerative 
coding in order to efficiently map information bits to lists. 

Second step: We now use the lists A*^'\ i (iV,m order to 
construct a cyclic path starting at vertex vq. We start the path 
at vq and build a length-iV path according to the following 
rule: when exiting vertex i for the fcth time, traverse the edge 
going into vertex . 

Of course, our encoding method is valid (and invertible) iff 
we may always abide by the above-mentioned rule. Namely, 
we don't get "stuck", and manage to complete a cyclic path 
of length . This is indeed the case: define an auxiliary graph 
G{D) with the same vertex set, V, as G and dij parallel edges 
from i to j (for all z, j S V). First, recall that for sufficiently 
large M, the presence of an edge from i to j in G implies that 
di,j > 0. Thus, since G was assumed to be irreducible, G{D) 
is irreducible as well. Also, an edge in T from i to j implies 
the existence of an edge in G{D) from i to j. Secondly, note 
that by ([3]l, the number of times we are supposed to exit a 
vertex is equal to the number of times we are supposed to 
enter it. The rest of the proof follows from [19, p. 56, Claim 
2], applied to the auxiliary graph G{D). Namely, our encoder 
follows directly from van Aardenne-Ehrenfest and de Bruijn's 
[1] theorem on counting Eulerian cycles in a graph. 

We now return to the rate, Rt, of our encoder. From (|6|, 
(|9]l, (fTOb and Theorem |5] we see that for M sufficiently large. 
At is greater than some positive constant times A. Thus, ^ 
still holds if we replace R by Rt and cap(S') by the entropy 
of 7^. 



VI. An example, and two improvement techniques 

Recall from Section HI] the square constraint: its elements 
are all the binary arrays in which no two '1' symbols are 
adjacent on a row, column, or diagonal. By employing the 
methods presented in [6], we may calculate an upper bound 
on the rate of the constraint. This turns out to be 0.425078. 
We will show an encoding/decoding method with rate slightly 
larger than 0.396 (about 93% of the upper bound). In order to 
do this, we assume that the array has 100,000 columns. Our 
encoding method has a fixed rate and has a vertical window 
of size 2 and vertical anticipation 0. 

We should point out now that a straightforward implemen- 
tation of the methods we have previously defined gives a rate 
which is strictly less than 0.396. Namely, this section also 
outlines two improvement techniques which help boost the 
rate. 

We start out as in the example given in Section [III except 
that the width of the data strips is now = 9 (the width of 
the merging strips remains it;,„ = 1). The graph G we choose 
produces all width-Wd arrays satisfying the square constraint, 
and we take the merging strips to be all-zero. Our array has 
100,000 columns, so we have M = 10,000 ti-acks (the last, 
say, column of the array will essentially be unused; we can 
set all of its values to 0). 

Define the normalized capacity as 

cap(5(G)) 

The graph G has \V\ = 89 vertices and normalized capacity 

cMSjG)) ^ cap(5(G)) ^ ^^^^ . 

Wd + Wm Wd+ Wmil - 1/A/) 

This number is about 94.5% from the upper bound on the 
capacity of our 2-D constraint. Thus, as expected, there is an 
inherent loss in choosing to model the 2-D constraint as an 
essentially 1-D constraint. Of course, this loss can be made 
smaller by increasing (but the graph G will grow as well). 

From Theorem [T] the rate of our encoder will approach 
the normalized capacity of 0.402 as the number of tracks M 
grows. So, once the graph G is chosen, the parameter we 
should be comparing ourselves to is the normalized capacity. 
We now apply the methods defined in Section |IV] and find 
a multiplicity matrix D. Recall that the matrix D defines 
an encoder. In our case, this encoder has a rate of about 
0.381. This is 94% of the normalized capacity, and is quite 
disappointing (but the improvements shown in Sections IVI-AI 
and I VI-Bl below are going to improve this rate). On the other 
hand, note that if we had limited ourselves to encode to each 
track independently of the others, then the best rate we could 
have hoped for with vertical anticipation turns out to be 0.3 
(see [17, Theorem 5]). 

A. Moore-style reduction 

We now define a graph G which we call the reduction of 
G. Essentially, we will encode by constructing paths in G, 
and then translate these to paths in G. In both G and G, the 



maxentropic distributions have the same entropy. The main 
virtue of G is that it often has less vertices and edges compared 
to G. Thus, the penahy in ([T]) resulting from using a finite 
number of tracks will often be smaller. 

For s > 0, we now recursively define the concept of s- 
equivalence (very much hke in the Moore algorithm [15, page 
1660]). 

• For s = 0, any two vertices vi , i'2 G V are 0-equivalent. 

• For s > 0, two vertices vi,V2 ^ V ait s-equivalent iff 
1) the two vertices vi,V2 are (s — 1) -equivalent, and 2) 
for each (s — 1) -equivalence class c, the number of edges 
from vi to vertices in c is equal to the number of edges 
from V2 to vertices in c. 

Denote by lis the partition induced by s-equivalence. For the 
graph G given in Figure |3] 

Ho = {0000,0001,0010,0100,0101,1000,1001,1010} , 

ns>l ={0000}, {0010, 0100}, {1000, 0001}, {1010, 1001, 0101} . 

Note that, by definition, IIs+i is a refinement of n^. Thus, let 
s' be the smallest s for which lis = n^+i. The set n^' can 
be efficiently found (essentially, by the Moore algorithm [15, 
page 1660]). 

Define a (non-labeled) graph G = (V, E) as follows. The 
vertex set of G is 

V = n,, . 

For each c £ V, let w(c) be a fixed element of c (if c contains 
more than one vertex, then pick one arbitrarily). Also, for each 
V dV, let c{v) be the class c G V such that v G c. Let (TG(e) 
((TG(e)) and TG(e) {TQ{e)) denote the start and end vertex of 
an edge e in G (G), respectively. The edge set E is defined as 

£=[j{e^E:aG(e)^v{c)} , (15) 
cev 

where 

crG(e) = c(crG(e)) and TQ{e) = c(TG(e)) . 

Namely, the number of edges from Ci to C2 in G is equal to the 
number of edges in G from some fixed vi G Ci to elements of 
C2, and, by the definition of s', this number does not depend 
on the choice of vi. The graph G is termed the reduction of G. 
The reduction of G from Figure |3] is given in Figure [8] Note 
that since G was assumed to be irreducible, we must have that 
G is irreducible as well. 




Fig. 8. Reduction of the grapli G from Figure [5] 



Lemma 6: The entropies of the maxentropic Markov chains 
on G and G are equal. 

Proof: Let A = Ag be the adjacency matrix of G, and 
recall that A = Ac is the adjacency matrix of G. Let A' 
and x' = {x'^)ce\/ be the Perron eigenvalue and right Perron 
eigenvector of A, respectively [15, §3.1]. Next, define the 
vector X = {xy)v^v as 

It is easily verifiable that a; is a right eigenvector of A, with 
eigenvalue A'. Now, since x' is a Perron eigenvector of an 
irreducible matrix, each entry of it is positive. Thus, each entry 
of X is positive as well. Since A is irreducible, we must have 
that a; is a Perron eigenvector of A. So, the Perron eigenvalue 
of A is also A'. ■ 

The next lemma essentially states that we can think of paths 
in G as if they were paths in G. 

Lemma 7: Let £ > 1. Fix some Co, c^+i G V, and vo G 
Cq. There exists a one-to-one correspondence between the 
following sets. First set: paths of length ^ in G with start vertex 
Co and end vertex c^+i. Second set: paths of length ^ in G 
with start vertex vq and end vertex in c^+i. 

Moreover, for 1 < t < ^ — 1, the first t edges in a path 
belonging to the second set are a function of only the first t 
edges in the respective path in the first set. 

Proof: We prove this by induction on £. For £ — 1, we 

have 

|{e G E : crG(e) = Co , TG(e) = Ci}| = 
\{eeE : aoie) = Vq , Taie) G Ci}| . 

To see this, note that we can assume w.l.o.g. that vq = w(co), 
and then recall (flSl l. For £ > 1, combine the claim for £ — 1 
with that for £ = 1. ■ 
Notice that diam(G) < diam(G). We now show why G is 
useful. 

Theorem 8: Let D be the multiplicity matrix found by the 
methods previously outlined, where we replace G by G. Let 
N = ID- 1^. We may efficiently encode (and decode) 
information to G®^ in a row-by-row manner at rate R{D). 

Proof: We conceptually break our encoding scheme into 
two steps. In the first step, we "encode" (map) the information 
into N paths in G, each path having length £. We do this as 
previously outlined (through typical vertices and edges in G). 
Note that this step is done at a rate of i?(D). In the second 
step, we map each such path in G to a corresponding path in 
G. By Lemma |7] we can indeed do this (take Cq as the first 
vertex in the path, c^+i as the last vertex, and vq = f (co)). 

By Lemma [T] we see that this two-step encoding scheme 
can easily be modified into one that is row-by-row. ■ 
Applying the reduction to our running example (square con- 
straint with Wd = 9 and u;,„ = 1), reduces the number of 
vertices from 89 in G to 34 in G. The computed D increases 
the rate to about 0.392, which is 97.5% of the normalized 
capacity. 



B. Break-merge 

Let G®'^ be the Nth Kronecker power of the Moore-style 
reduction G. Recall that the rate of our encoder is 

Llog2 AJ 



12 



18 



R{D) 



M 



where A is the number of typical edges in G*®^ going out of 
a typical vertex. The second improvement involves expanding 
the definition of a typical edge, thus increasing A. This is best 
explained through an example. Suppose that G has Figure |9] as 
a subgraph; namely, we show all edges going out of vertices 
a and /3. Also, let the numbers next to the edges be equal to 
the corresponding entries in D. The main thing to notice at 
this point is that if the edges to e and ( are deleted ("break"), 
then a and (3 have exactly the same number of edges from 
them to vertex j, for all j ^ V (after the deletion of edges, 
vertices a and [3 can be "merged"). 




Fig. 9. Break-merge example graph. 

Let t; be a typical vertex. A short calculation shows that the 
number of entries in v that are equal to a ((3) is 5 + 4 + 3= 12 
(9 + 7 + 2 = 18). Recall that the standard encoding process 
consists of choosing a typical edge e going out of the typical 
vertex v and into another typical vertex v'. We now briefly 
review this process. Consider the 12 entries in v that are equal 
to a. The encoding process with respect to them will be as 
follows (see Figure [TOl): 

• Out of these 12 entries, choose 5 for which the corre- 
sponding entry in v' will be e. Since there is exactly one 
edge from a the e in G, the corresponding entries in e 
must be equal to that edge. 

• Next, from the remaining 7 entries, choose 4 for which 
the corresponding entries in v' will be 9. There are two 
parallel edges from a to 0, so choose which one to use 
in the corresponding entries in e. 

• We are left with 3 entries, the corresponding entries 
in v' will be 6. Also, we have one option as to the 
corresponding entries in e. 

A similar process is applied to the entries in v that are equal 
to /?. Thus, the total number of options with respect to these 
entries is 

12! • 2* 18! -29 14 
5!-4!.3! ■ 2!.9!-7! ^ ' 
Next, consider a different encoding process (see Figure [TT]). 

• Out of the 12 entries in v that are equal to a, choose 5 for 
which the corresponding entry in v' will be e. As before, 
the corresponding entries in e have only one option. 
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Fig. 10. Illustration of the entries in two typical vertices d, v' , where we 
got from V to d' by the standard encoding process. 



« Out of the 18 entries in v that are equal to /3, choose 2 
for the corresponding entry in v' will be Q. Again, one 
option for entries in e. 

• Now, of the remaining 23 entries in v that are equal to 
a or /3, choose 4 + 9 = 13 for which the corresponding 
entry in v' will be d. We have two options for the entries 
in e. 

• We are left with 3 + 7 = 10 entries in v that are equal 
to a or /?. These will have (5 as the corresponding entry 
in i)', and one option in e. 
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Fig. 11. Illustration of the entries in two typical vertices v, v', where we got 
from V to v' by the improved encoding process. The shaded part con'esponds 
to veilices that were merged. 



Thus, the total number of options is now 
'12\ /18\ 23! -213 
2 I ' 13! ■ 10! 



1.14 • 10 



15 



The important thing to notice is that in both cases, we arrive 
at a typical vertex v'. 

To recap, we first "broke" the entries in v that are equal 
to a into two groups: Those which will have e as the 
corresponding entry in v' and those which will have 9 or S 
as the corresponding entry. Similarly, we broke entries in v 
that are equal to f3 into two groups. Next, we noticed that of 
these four groups, two could be "merged", since they were 
essentially the same. Namely, removing some edges from the 
corresponding vertices in G resulted in vertices which were 
mergeable. 

Of course, these operations can be repeated. The hidden 
assumption is that the sequence of breaking and merging is 
fixed, and known to both the encoder and decoder The optimal 
sequence of breaking and merging is not known to us. We used 
a heuristic. Namely, choose two vertices such that the sets of 



edges emanating from both have a large overlap. Then, break 
and merge accordingly. This was done until no breaking or 
merging was possible. We got a rate of about 0.396, which is 
98.5% of the normalized capacity. 

VII. Fast enumerative coding 

Recall from Section |lll] that in the course of our encoding 
algorithm, we make use of a procedure which encodes in- 
formation into fixed-length binary words of constant weight. 
A way to do this would be to use enumerative coding [9]. 
Immink [13] showed a method to significantly improve the 
running time of an instance of enumerative coding, with a 
typically negligible penalty in terms of rate. We now briefly 
show how to tailor Immink' s method to our needs. 

Denote by n and 5 the length and Hamming weight, 
respectively, of the binary word we encode into. Some of our 
variables will be floating-point numbers with a mantissa of /i 
bits and an exponent of e bits: each floating-point number is 
of the form x — a ■ 2^ where a and b are integers such that 

2^' <a< 2'"+! and """-^ ^ ^ - '^'-^ 



< 6 < 2' 



Note that /i + e bits are needed to store such a number. Also, 
note that every positive real x such that 

2^ . 2-2=-' < 2; < (2^+1 - 1) . 22'"'-! 

has a floating point approximation x with relative precision 

1 \ xf^ 1 



1 



,<-<!. 
- X - \ 2^ 



(16) 



We assume the presence of two look-up tables. The first 
will contain the floating-point approximations of 1!, 2!, . . . , n\. 
The second will contain the floating-point approximations of 
/(0),/(l),...,/((5), where 

32x+ 16 



fix) - /m(x) - 1 



2^" 



In order to exclude uninteresting cases, assume that /i > 10 
and is such that f{5) > 1/2. Also, take e large enough so 
that n! is less than the maximum number we can represent by 
floating point. Thus, we can assume that /i = 0{logd) and 
e — O(logn). 

Notice that in our case, we can bound both n and S from 
above by the number of tracks M. Thus, we will actually build 
beforehand two look-up tables of size 2M(/i + e) bits. 

Let X denote the floating-point approximation of x, and 
let * and denote floating-point multiplication and division, 
respectively. For < x < k < n we define 



K 




X 





Note that since we have stored the relevant numbers in our 
look-up table, the time needed to calculate the above function 
is only 0{fi^ + e). The encoding procedure is given in 
Figure [12] We note the following points: 

• The variables n, ip, S and t are integers (as opposed to 
floating-point numbers). 



In the subtraction of \^Zi \ from ip in line 5, the floating- 
point number [^~{] is "promoted" to an integer (the result 
is an integer). 



Name: EnumEncode(n, 5, ip) 

Input: Integers n, S, ip such that Q < 5 < n and < ^ < . 
Output: A binary word of length n and weight 5. 

if ((5 == 0) // stopping condition: /* 1 */ 

return 00_^^; /* 2 */ 

for (t ^ 1; t < n - 5 + 1; { 1*3*1 

if(^>rrii) /*4*/ 

else /* 6 */ 

return 00^1||EnumEncode(n — t, 5 — 1, ?/>); /* 7 */ 

} /* 8 */ 

Fig. 12. Enumerative encoding procedure for constant-weiglit binary words. 

We must now show that the procedure is valid, namely, 
that given a valid input, we produce a valid output. For our 
procedure, this reduce to showing two things: 1) If the stopping 
condition is not met, a recursive call will be made. 2) The 
recursive call is given valid parameters as well. Namely, in 
the recursive call, t/j is non-negative. Also, for the encoding 
to be invertible, we must further require that 3) [p] — 1 for 
n > 0. 

Condition 2 is clearly met, because of the check in line 4. 
Denote 



= («!*/(x))-(x!*(«-x)!) 



(and so, = [(")!). Condition 3 foflows from the next 



X 

lemma. 

Lemma 9: Fix Q < 5 < n. Then, 



1 - 



32((5 + 1) 



< 



< 



1 - 



32(5 



2^ J ~ \5 / ~ \5 J V 2^ . 
Proof: The proof is essentially repeated invocations of 

( fT6b on the various stages of computation. We leave the details 

to the reader. ■ 

Finally, Condition 1 follows easily from the next lemma. 

Lemma 10: Fix {) < 5 <n. Then, 



n-S+l 

^ E 

1=1 



Proof: The claim will follow if we show that 

n-S+l 

E 



< 



t=i 



n — L 
6-1 



This is immediate from Lemma |9] and the binomial identity 

n-S+l 



E 

t=i 



n — L 



Note that the penahy in terms of rate one suffers because 
of using our procedure (instead of plain enumerative coding) 
is negligible. Namely, log2 [^'] can be made arbitrarily close 
to log2 (Jg). Since we take e = O(logn) and /i = 0{\og6), 
we can show by amortized analysis that the running time 
of the procedure is O(rilog'^n). Specifically, see [8, Section 
17.3], and take the potential of the binary vector corresponding 
to ip as the number of entries in it that are equal to '0'. 
The decoding procedure is a straightforward "reversal" of the 
encoding procedure, and its running time is also 0(nlog^ n). 

VIII. Appendix 

Proof of TheoremU} Let A be as in (O, where we replace 
di.j by pi,j and by pi. By the combinatorial interpretation 
of (|5]l, and the fact that dij > fiij for all i,j G V, it easily 
follows that A > A. Thus, 



R{D) > 



Llog^AJ ^ M' Llog2 AJ 
M M ' M' 



Denote by e the base of natural logarithms. By Stirling's 
formula we have 

\og^{t\)^t\og^{t/e) + 0{\ogt) , 

and from (|5]l we get that 

log2 A ^ log2(pi/e) - ^ Pi. J log2(pij/e) 

+ E P.,,log2(a..,)-0(|yplogM) . 



i,jev 



By and Q, 

X! Pij log2(aij) = 

X! P»jl0g2(a«j) - O (|Vpl0g2(ainax/amin)) • 

Since pij — pi, we have 

Epilog2(Pi/e) - E p,jlog2(pij/e) 

i&V i,j&V 

iev ijev 
Moreover, by ^ and the RHS of the last equation equals 

Ep,log2(p,) - P'jl0g2fej) - 0{\V\^) . 

We conclude that 

l0g2 A = E log2(Pi) - X! l0g2bij ) 

+ XI l0g2(ai,i) - O ([^^(log Af • amax/amin)) • 

Lastly, recall that pi = M'lTi and pij = piQi.j. Thus, 

l0g2 A = M'H{P) - O (|y|2(logM • a„,ax/amin)) , 



where H{V) is the entropy of the stationary Markov chain 
V with transition matrix Q. Recall that P was selected to be 
maxentropic: H{V) — cap(S'(G)). This fact, along with ^ 
and a short calculation, finishes the proof. ■ 
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